feat: add benchmark framework for collection mount performance#7915
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis PR introduces a comprehensive benchmarking infrastructure for measuring collection mount performance across operating systems. It adds GitHub Actions workflows, Playwright benchmark configuration, collection generation utilities, statistical helpers, result comparison tooling, PR integration, and baseline metrics for Ubuntu, macOS, and Windows. ChangesBenchmark Infrastructure & Test Suite
Sequence DiagramsequenceDiagram
participant GHA as GitHub Actions
participant CAct as Composite Action
participant Pw as Playwright Runner
participant Bench as Benchmark Test
participant CmpJS as compare.js
participant CommentJS as pr-comment.js
participant GHApi as GitHub API
GHA->>CAct: Trigger with os & update-baseline
CAct->>Pw: Run benchmarks (with xvfb on Linux)
Pw->>Bench: Execute mount tests
Bench->>Bench: Measure timing, generate results.json
CAct->>CmpJS: Load results & baseline
alt update-baseline == true
CmpJS->>CmpJS: Compute new baseline from results
CmpJS->>Pw: Write baseline.json
else compare mode
CmpJS->>CmpJS: Compare against baseline
CmpJS->>CmpJS: Check for regressions
end
GHA->>GHA: Upload results & baseline artifacts
alt Pull Request event
GHA->>CommentJS: Load results & baseline
CommentJS->>CommentJS: Build comparison table
CommentJS->>GHApi: Post/update PR comment
GHApi->>GHApi: Display benchmark results
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (5)
tests/benchmarks/mounting/collection-mount.bench.ts (2)
81-81: ⚡ Quick winReplace
page.waitForTimeout(500)with a deterministic wait.Per coding guidelines,
page.waitForTimeout()should only be used when no suitableexpect()locator-based wait is available.If
closeAllCollectionsleaves a detectable UI change (e.g., the sidebar collection list becomes empty), prefer:- await page.waitForTimeout(500); + await expect(page.locator('#sidebar-collection-name')).toHaveCount(0);If there's genuinely no stable locator to await after close, add a brief comment justifying the timeout.
As per coding guidelines: "Try to reduce usage of
page.waitForTimeout();in code unless absolutely necessary and the locator cannot be found using existingexpect()playwright calls."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/benchmarks/mounting/collection-mount.bench.ts` at line 81, Replace the non-deterministic sleep (page.waitForTimeout(500)) in the collection-mount benchmark with a locator-based expect or a justified comment: after calling closeAllCollections() wait for a concrete UI change (e.g., expect(sidebarCollectionList).toBeEmpty() or expect(collectionItem.locator('text=...')).not.toBeVisible()) that signals completion; if there truly is no stable DOM change to await, keep a very brief comment explaining why a fixed timeout is required and reduce the timeout to the minimum necessary while noting the reason.
70-95: ⚡ Quick winWrap test body steps with
test.stepfor clearer reports.The inner test body performs distinct phases — generate, measure (× N), record, log — but has no
test.stepwrapping, making failure traces harder to read.♻️ Example restructuring with test.step
test(`mount ${format} collection with ${size} requests`, async ({ page, electronApp, createTmpDir }) => { test.setTimeout((2 + Math.ceil(size / 100) * 2) * 60_000); const timings: number[] = []; for (let i = 0; i < ITERATIONS_PER_SIZE; i++) { - const collectionName = `bench-${format}-${size}-iter-${i}`; - const collectionDir = await createTmpDir(`bench-${format}-${size}-${i}`); - generateCollection({ dir: collectionDir, name: collectionName, requestCount: size, format }); - - const elapsed = await measureCollectionMount(page, electronApp, collectionDir, collectionName); - timings.push(Math.round(elapsed)); - await page.waitForTimeout(500); + await test.step(`iteration ${i + 1}`, async () => { + const collectionName = `bench-${format}-${size}-iter-${i}`; + const collectionDir = await createTmpDir(`bench-${format}-${size}-${i}`); + await test.step('generate collection', () => { + generateCollection({ dir: collectionDir, name: collectionName, requestCount: size, format }); + }); + const elapsed = await test.step('measure mount', () => + measureCollectionMount(page, electronApp, collectionDir, collectionName) + ); + timings.push(Math.round(elapsed)); + }); }As per coding guidelines: "Promote the use of
test.stepas much as possible so the generated reports are easier to read."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/benchmarks/mounting/collection-mount.bench.ts` around lines 70 - 95, The test body for the `test("mount ${format} collection with ${size} requests", ...)` case should be split into explicit Playwright `test.step` calls to improve report clarity: wrap the collection generation loop (calls to `generateCollection` and `createTmpDir`) in a "generate collections" step, wrap each measurement call to `measureCollectionMount` (and the per-iteration wait) inside a "measure mount (iteration i)" step or a single "measure mounts" step, and wrap result aggregation/annotation (uses of `timings`, `resultKey`, `summarize`, and `test.info().annotations.push`) and the final console log into a "record results" step; update the test body (the anonymous async function passed to `test(...)`) to call `await test.step(name, async () => { ... })` around those phases so failures show as separate steps while preserving existing logic and variables.tests/benchmarks/utils/compare.js (1)
62-62: ⚡ Quick win
baseline.collectionsfallback is undocumented dead code.The baseline schema only defines
entries; thebaseline.collectionsfallback has no corresponding writer and will silently swallow a missingentrieskey instead of failing loudly.-const baselineEntries = baseline.entries || baseline.collections || {}; +const baselineEntries = baseline.entries || {};🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/benchmarks/utils/compare.js` at line 62, Replace the undocumented fallback that uses baseline.collections with a strict requirement for baseline.entries: change the initialization of baselineEntries (the variable named baselineEntries) to only read baseline.entries and add a guard that throws (or returns a clear error) when baseline.entries is missing so the function fails loudly instead of silently using a dead-path; this ensures the code path in compare.js validates the expected schema rather than swallowing a missing entries key.tests/benchmarks/utils/pr-comment.js (1)
29-33: ⚡ Quick win
pctis a string — parse it before comparing to numbers.
.toFixed(1)returns astring, so all comparisons on lines 30–31 and 33 (pct > threshold,pct < -threshold,pct > 0) silently rely on JS coercion. Whenbase.meanis0,pctbecomes"Infinity"(or potentially"NaN") and the coercion is unreliable. Keeppctnumeric; format only at the point of interpolation.♻️ Suggested fix
- const pct = ((data.mean - base.mean) / base.mean * 100).toFixed(1); - const status = pct > threshold ? '🔴 REGRESSION' : pct < -threshold ? '🟢 IMPROVED' : '✅ OK'; - if (pct > threshold) hasRegression = true; - - body += `| ${key} | ${Math.round(data.mean)} | ${base.mean} | ${pct > 0 ? '+' : ''}${pct}% | ${status} |\n`; + const pct = (data.mean - base.mean) / base.mean * 100; + const pctStr = pct.toFixed(1); + const status = pct > threshold ? '🔴 REGRESSION' : pct < -threshold ? '🟢 IMPROVED' : '✅ OK'; + if (pct > threshold) hasRegression = true; + + body += `| ${key} | ${Math.round(data.mean)} | ${base.mean} | ${pct > 0 ? '+' : ''}${pctStr}% | ${status} |\n`;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/benchmarks/utils/pr-comment.js` around lines 29 - 33, pct is being set to a string via .toFixed(1) and then compared numerically; change the code so pct is a numeric value used for comparisons (compute pct = (data.mean - base.mean) / base.mean * 100 without .toFixed), perform numeric comparisons against threshold and -threshold (affecting the checks using pct in the hasRegression assignment and status calculation), and only format pct with toFixed(1) when building the output interpolation for body (e.g., use a formattedPct variable for the string). Ensure the same named symbols are updated: pct (numeric), hasRegression, status, and the body interpolation where `${pct}` is inserted.tests/benchmarks/mounting/baseline.json (1)
1-46: ⚡ Quick winPlaceholder baseline values —
bru-*andyml-*entries are identical, which may mask format-specific regressions.All ten entries share the same
mean/p50figures (e.g., bothbru-50andyml-50havemean: 2000). Committing equal baselines for two formats that likely have distinct I/O characteristics means the comparison is not meaningful until real numbers replace these. Consider running the suite once on a reference machine and replacing these with the actual measured medians before merging, or at minimum add a prominent"status": "placeholder"field and fail CI baseline comparison when that flag is present.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/benchmarks/mounting/baseline.json` around lines 1 - 46, The baseline file currently uses placeholder identical values for all entries (see entries "bru-50"..."bru-5000" and "yml-50"..."yml-5000"), which masks format-specific regressions; replace these placeholders by running the benchmark on a reference machine and updating the entries with the real measured mean/p50 (use the provided script node tests/benchmarks/mounting/compare.js --update-baseline) so each "bru-*" and "yml-*" has accurate numbers, or if you cannot run benchmarks yet add a top-level "status":"placeholder" flag in the JSON and update CI to fail baseline comparisons when status is "placeholder" to prevent merging until real values are recorded.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/benchmarks.yml:
- Around line 61-73: The "Comment Benchmark Results on PR" workflow step (uses:
actions/github-script@v7 and calling run({ resultsPath:
'tests/benchmarks/results/mounting.json', baselinePath:
'tests/benchmarks/mounting/baseline.json', title: 'Benchmark Results —
Collection Mount' })) can fail with 403 for forked PRs; update that step to
include continue-on-error: true to avoid failing the whole job on permission
errors (alternatively replace the event with a secure
pull_request_target/workflow_run pattern if you need to support writing comments
from privileged runs).
In `@tests/benchmarks/utils/collection-generator.ts`:
- Around line 46-47: The ymlContent produced by stringifyCollection may be
null/undefined and is written directly via fs.writeFileSync, so add a fallback
like the existing one for the 'bru' output: compute ymlContent =
stringifyCollection(...) || `meta: { name: "${name}", type: "collection",
opencollection: "1.0.0" }` (or a minimal valid YAML string), then pass that safe
ymlContent to fs.writeFileSync; update references to stringifyCollection and
ymlContent accordingly so opencollection.yml never contains "undefined"/"null".
In `@tests/benchmarks/utils/compare.js`:
- Line 21: The file uses ESM import syntax (import { existsSync, readFileSync,
writeFileSync } from 'fs') inside tests/benchmarks/utils/compare.js which causes
a SyntaxError under default CommonJS Node; either rename compare.js →
compare.mjs and update .github/workflows/benchmarks.yml to invoke the .mjs path
(Option A), or convert the file to CommonJS by replacing the import with const {
existsSync, readFileSync, writeFileSync } = require('fs') and ensure any other
ESM syntax is changed accordingly (Option B); pick one option and make the
matching change to the CI workflow if you rename the file.
In `@tests/benchmarks/utils/pr-comment.js`:
- Around line 77-78: The code reads baselinePath without checking existence,
causing an uncaught ENOENT; wrap the baseline read in the same guard/handling
used for resultsPath — either check fs.existsSync(baselinePath) before
JSON.parse or perform the fs.readFileSync/JSON.parse inside a try/catch, and on
error emit a clear log/error message and exit or return; update the lines that
call JSON.parse(fs.readFileSync(baselinePath, 'utf-8')) (and any surrounding
logic using baseline) to use this existence check or try/catch so missing or
invalid baseline files produce a helpful message instead of an unhandled
exception.
- Around line 46-52: The current call using github.rest.issues.listComments(...)
and then searching comments.find(...) can miss the existing marker on PRs with
>100 comments; update the call to either include per_page: 100 and handle
pagination, or replace the listComments call with
github.paginate(github.rest.issues.listComments, { owner: context.repo.owner,
repo: context.repo.repo, issue_number: context.issue.number, per_page: 100 })
and then search the returned array for the marker (remove the .data unwrap since
paginate returns items directly); adjust the variable currently named comments
and the comments.find(...) usage accordingly so duplicate benchmark comments are
not created.
---
Nitpick comments:
In `@tests/benchmarks/mounting/baseline.json`:
- Around line 1-46: The baseline file currently uses placeholder identical
values for all entries (see entries "bru-50"..."bru-5000" and
"yml-50"..."yml-5000"), which masks format-specific regressions; replace these
placeholders by running the benchmark on a reference machine and updating the
entries with the real measured mean/p50 (use the provided script node
tests/benchmarks/mounting/compare.js --update-baseline) so each "bru-*" and
"yml-*" has accurate numbers, or if you cannot run benchmarks yet add a
top-level "status":"placeholder" flag in the JSON and update CI to fail baseline
comparisons when status is "placeholder" to prevent merging until real values
are recorded.
In `@tests/benchmarks/mounting/collection-mount.bench.ts`:
- Line 81: Replace the non-deterministic sleep (page.waitForTimeout(500)) in the
collection-mount benchmark with a locator-based expect or a justified comment:
after calling closeAllCollections() wait for a concrete UI change (e.g.,
expect(sidebarCollectionList).toBeEmpty() or
expect(collectionItem.locator('text=...')).not.toBeVisible()) that signals
completion; if there truly is no stable DOM change to await, keep a very brief
comment explaining why a fixed timeout is required and reduce the timeout to the
minimum necessary while noting the reason.
- Around line 70-95: The test body for the `test("mount ${format} collection
with ${size} requests", ...)` case should be split into explicit Playwright
`test.step` calls to improve report clarity: wrap the collection generation loop
(calls to `generateCollection` and `createTmpDir`) in a "generate collections"
step, wrap each measurement call to `measureCollectionMount` (and the
per-iteration wait) inside a "measure mount (iteration i)" step or a single
"measure mounts" step, and wrap result aggregation/annotation (uses of
`timings`, `resultKey`, `summarize`, and `test.info().annotations.push`) and the
final console log into a "record results" step; update the test body (the
anonymous async function passed to `test(...)`) to call `await test.step(name,
async () => { ... })` around those phases so failures show as separate steps
while preserving existing logic and variables.
In `@tests/benchmarks/utils/compare.js`:
- Line 62: Replace the undocumented fallback that uses baseline.collections with
a strict requirement for baseline.entries: change the initialization of
baselineEntries (the variable named baselineEntries) to only read
baseline.entries and add a guard that throws (or returns a clear error) when
baseline.entries is missing so the function fails loudly instead of silently
using a dead-path; this ensures the code path in compare.js validates the
expected schema rather than swallowing a missing entries key.
In `@tests/benchmarks/utils/pr-comment.js`:
- Around line 29-33: pct is being set to a string via .toFixed(1) and then
compared numerically; change the code so pct is a numeric value used for
comparisons (compute pct = (data.mean - base.mean) / base.mean * 100 without
.toFixed), perform numeric comparisons against threshold and -threshold
(affecting the checks using pct in the hasRegression assignment and status
calculation), and only format pct with toFixed(1) when building the output
interpolation for body (e.g., use a formattedPct variable for the string).
Ensure the same named symbols are updated: pct (numeric), hasRegression, status,
and the body interpolation where `${pct}` is inserted.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b03d3316-2bd5-4d22-9187-5e9061d11d3b
📒 Files selected for processing (13)
.github/actions/tests/run-benchmark-tests/action.yml.github/workflows/benchmarks.yml.gitignorepackage.jsonplaywright.benchmark.config.tsplaywright.config.tstests/benchmarks/mounting/baseline.jsontests/benchmarks/mounting/collection-mount.bench.tstests/benchmarks/utils/collection-generator.tstests/benchmarks/utils/compare.jstests/benchmarks/utils/pr-comment.jstests/benchmarks/utils/results.tstests/benchmarks/utils/stats.ts
There was a problem hiding this comment.
♻️ Duplicate comments (1)
.github/workflows/benchmarks.yml (1)
75-87:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPrevent fork PR permission errors from failing the benchmark job
Line 75–87 can still fail on fork-originated
pull_requestruns when PR comment write is blocked. Make the comment step non-blocking so benchmark execution/artifacts remain available.Suggested patch
- name: Comment Benchmark Results on PR if: github.event_name == 'pull_request' && !cancelled() && matrix.os-name == 'ubuntu' + continue-on-error: true uses: actions/github-script@v7 with: script: | const run = require('./tests/benchmarks/utils/pr-comment.js'); await run({🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/benchmarks.yml around lines 75 - 87, The "Comment Benchmark Results on PR" step currently calls the pr-comment script and can fail on forked PRs when comment permissions are blocked; make this step non-blocking by adding continue-on-error: true to that step (the step with name "Comment Benchmark Results on PR" that uses actions/github-script@v7 and invokes the run(...) call) so failures to post the comment won't fail the whole job and artifacts/results remain available.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In @.github/workflows/benchmarks.yml:
- Around line 75-87: The "Comment Benchmark Results on PR" step currently calls
the pr-comment script and can fail on forked PRs when comment permissions are
blocked; make this step non-blocking by adding continue-on-error: true to that
step (the step with name "Comment Benchmark Results on PR" that uses
actions/github-script@v7 and invokes the run(...) call) so failures to post the
comment won't fail the whole job and artifacts/results remain available.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c9bd73f9-89e6-4048-a88a-9f2e6dad5879
📒 Files selected for processing (3)
.github/workflows/benchmarks.ymltests/benchmarks/mounting/baseline.jsontests/benchmarks/mounting/collection-mount.bench.ts
✅ Files skipped from review due to trivial changes (1)
- tests/benchmarks/mounting/baseline.json
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/benchmarks/mounting/collection-mount.bench.ts
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/benchmarks.yml:
- Around line 67-73: Replace the per-matrix push in the "Commit Updated
Baseline" step with an artifacts workflow: in the matrix job (the step keyed by
name "Commit Updated Baseline" that currently checks
github.event.inputs.update-baseline == 'true' and references ${ { matrix.os-name
} }), stop running git commit/push and instead upload the updated baseline file
as an artifact named with the OS (use actions/upload-artifact, upload
tests/benchmarks/mounting/baseline.${{ matrix.os-name }}.json). Add a single
downstream job (e.g., "merge-baseline-updates") that runs once when
github.event.inputs.update-baseline == 'true', depends on the matrix job(s) via
needs, downloads all per-OS artifacts with actions/download-artifact, places
them into tests/benchmarks/mounting/, then performs git config, a single git
add/commit -m "chore: update baseline" and git push so one atomic commit handles
all baseline files and avoids concurrent non-fast-forward failures.
- Around line 67-73: The commit step named "Commit Updated Baseline" currently
uses the chained shell expression `git diff --staged --quiet || git commit -m
... && git push`, which can run `git push` even when no commit was created;
replace this operator chaining with an explicit conditional that checks the
staged-diff result and only runs the commit-and-push sequence when changes exist
(i.e., test `git diff --staged --quiet` and, if it indicates changes, run `git
commit -m "chore: update ${{ matrix.os-name }} benchmark baseline"` followed by
`git push`), ensuring push is executed only after a successful commit.
In `@tests/benchmarks/mounting/baseline.windows.json`:
- Around line 3-43: The baseline JSON "entries" currently only includes sizes up
to "bru-3000" and "yml-3000", but the benchmark coverage should span 50–5000;
run the mount benchmarks for the 5000-size collections and add new baseline
records named "bru-5000" and "yml-5000" to the entries object (including mean
and p50 stats) so regression detection covers the largest collection size.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0c3f77dd-22e8-4b5e-bcbc-78927b8e9bd9
📒 Files selected for processing (5)
.github/actions/tests/run-benchmark-tests/action.yml.github/workflows/benchmarks.ymltests/benchmarks/mounting/baseline.macos.jsontests/benchmarks/mounting/baseline.ubuntu.jsontests/benchmarks/mounting/baseline.windows.json
✅ Files skipped from review due to trivial changes (2)
- tests/benchmarks/mounting/baseline.macos.json
- tests/benchmarks/mounting/baseline.ubuntu.json
6dbaaf0 to
69e92bc
Compare
There was a problem hiding this comment.
Things to keep in mind for the next set of tasks
-
The action's PR comment might fail, please bring it to my notice once merged so that I can update the token to have the needed perms to do that.
-
A benchmark will need more than 1 run for each collection set. The mean, median, p95, p99 right now just based on 1 run per collection variant and that's not exactly apt for a fair benchmark, there's 3 iterations being used for the compute.
f862b30 to
30ef913
Compare
- Playwright benchmark tests measuring collection mount time across bru/yml formats and sizes (50-5000 requests) - IPC listener approach for precise mount-complete signal - Generic benchmark utils: stats, results I/O, baseline comparison, PR commenting - Collection generator using @usebruno/filestore serializers - CI workflow running on ubuntu, macos, and windows with PR comment reporting - Regression detection against committed baselines with configurable threshold
…fix review issues - Same collection mounted/unmounted across iterations for cold vs cached comparison - workflow_dispatch has update-baseline boolean input for manual baseline updates - Fix string comparison bug in pr-comment.js (pct was string from toFixed) - Remove dead baseline.collections fallback in compare.js and pr-comment.js - Remove unnecessary waitForTimeout between iterations - Rename pct/pctChange to changePercent/percentChange for readability
…mmit on update-baseline - Reduce max collection size from 5000 to 3000 to keep CI runtime reasonable - Update baseline values from actual CI run data (worst case across ubuntu/macos/windows) - Auto-commit updated baseline.json when update-baseline is triggered via workflow_dispatch - Reuse same collection across iterations for cold vs cached comparison - Fix string comparison bug and remove dead code from review feedback - Rename pct variables to changePercent for readability - Remove unnecessary waitForTimeout between iterations
- Add continue-on-error to PR comment step since GITHUB_TOKEN lacks write access on cross-fork PRs
- Split baseline.json into baseline.ubuntu/macos/windows.json with real CI data - Action and workflow dynamically reference baseline per OS - PR comment posted per OS with OS-specific comparison - Auto-commit updated baseline on workflow_dispatch with update-baseline flag
…ults - writeResults now accepts SuiteMeta with name, unit, and direction - Results JSON includes suite field for the visualization dashboard to ingest - Mounting benchmark outputs unit: ms, direction: smaller
30ef913 to
29668da
Compare
Description
JIRA
Contribution Checklist:
Note: Keeping the PR small and focused helps make it easier to review and merge. If you have multiple changes you want to make, please consider submitting them as separate pull requests.
Publishing to New Package Managers
Please see here for more information.
Summary by CodeRabbit
New Features
test:benchmarknpm script for running performance tests.Chores
.gitignore.